AITopics | manual feature engineering

Collaborating Authors

manual feature engineering

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AdaRec: Adaptive Recommendation with LLMs via Narrative Profiling and Dual-Channel Reasoning

Wang, Meiyun, Polpanumas, Charin

arXiv.org Artificial IntelligenceNov-11-2025

We propose AdaRec, a few-shot in-context learning framework that leverages large language models for an adaptive personalized recommendation. AdaRec introduces narrative profiling, transforming user-item interactions into natural language representations to enable unified task handling and enhance human readability. Centered on a bivariate reasoning paradigm, AdaRec employs a dual-channel architecture that integrates horizontal behavioral alignment, discovering peer-driven patterns, with vertical causal attribution, highlighting decisive factors behind user preferences. Unlike existing LLM-based approaches, AdaRec eliminates manual feature engineering through semantic representations and supports rapid cross-task adaptation with minimal supervision. Experiments on real ecommerce datasets demonstrate that AdaRec outperforms both machine learning models and LLM-based baselines by up to eight percent in few-shot settings. In zero-shot scenarios, it achieves up to a nineteen percent improvement over expert-crafted profiling, showing effectiveness for long-tail personalization with minimal interaction data. Furthermore, lightweight fine-tuning on synthetic data generated by AdaRec matches the performance of fully fine-tuned models, highlighting its efficiency and generalization across diverse tasks.

adarec, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.07166

Genre: Research Report (0.64)

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

OpCode-Based Malware Classification Using Machine Learning and Deep Learning Techniques

Saini, Varij, Gupta, Rudraksh, Soni, Neel

arXiv.org Artificial IntelligenceApr-21-2025

This technical report presents a comprehensive analysis of malware classification using OpCode sequences. Two distinct approaches are evaluated: traditional machine learning using n-gram analysis with Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Decision Tree classifiers; and a deep learning approach employing a Convolutional Neural Network (CNN). The traditional machine learning approach establishes a baseline using handcrafted 1-gram and 2-gram features from disassembled malware samples. The deep learning methodology builds upon the work proposed in "Deep Android Malware Detection" by McLaughlin et al. and evaluates the performance of a CNN model trained to automatically extract features from raw OpCode data. Empirical results are compared using standard performance metrics (accuracy, precision, recall, and F1-score). While the SVM classifier outperforms other traditional techniques, the CNN model demonstrates competitive performance with the added benefit of automated feature extraction.

artificial intelligence, machine learning, yazdinejad, (17 more...)

arXiv.org Artificial Intelligence

2504.13408

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Time-aware Metapath Feature Augmentation for Ponzi Detection in Ethereum

Jin, Chengxiang, Zhou, Jiajun, Jin, Jie, Wu, Jiajing, Xuan, Qi

arXiv.org Artificial IntelligenceOct-30-2022

With the development of Web 3.0 which emphasizes decentralization, blockchain technology ushers in its revolution and also brings numerous challenges, particularly in the field of cryptocurrency. Recently, a large number of criminal behaviors continuously emerge on blockchain, such as Ponzi schemes and phishing scams, which severely endanger decentralized finance. Existing graph-based abnormal behavior detection methods on blockchain usually focus on constructing homogeneous transaction graphs without distinguishing the heterogeneity of nodes and edges, resulting in partial loss of transaction pattern information. Although existing heterogeneous modeling methods can depict richer information through metapaths, the extracted metapaths generally neglect temporal dependencies between entities and do not reflect real behavior. In this paper, we introduce Time-aware Metapath Feature Augmentation (TMFAug) as a plug-and-play module to capture the real metapath-based transaction patterns during Ponzi scheme detection on Ethereum. The proposed module can be adaptively combined with existing graph-based Ponzi detection methods. Extensive experimental results show that our TMFAug can help existing Ponzi detection methods achieve significant performance improvements on the Ethereum dataset, indicating the effectiveness of heterogeneous temporal information for Ponzi scheme detection.

artificial intelligence, machine learning, metapath, (19 more...)

arXiv.org Artificial Intelligence

2210.16863

Country:

Asia > China > Zhejiang Province > Hangzhou (0.05)
Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Guangzhou (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Manual Feature Engineering

#artificialintelligenceSep-19-2019, 15:53:18 GMT

There is also a complementary Domino project available. Many data scientists deliver value to their organizations by mapping, developing, and deploying an appropriate ML solution to address a business problem. Feature engineering is useful for data scientists when assessing tradeoff decisions regarding the impact of their ML models. It is a framework for approaching ML as well as providing techniques for extracting features from raw data that can be used within the models. As Domino seeks to help data scientists accelerate their work, we reached out to AWP Pearson for permission to excerpt the chapter "Manual Feature Engineering: Manipulating Data for Fun and Profit" from the book, Machine Learning with Python for Everyone by Mark E. Fenner. Many thanks to AWP Pearson for providing the permissions to excerpt the work and enabling us to provide a complementary publicly viewable Domino project. We are going to turn our attention away from expanding our catalog of models [as mentioned previously in the book] and instead take a closer look at the data. Feature engineering refers to manipulation--addition, deletion, combination, mutation--of the features. Remember that features are attribute- value pairs, so we could add or remove columns from our data table and modify values within columns. Feature engineering can be used in a broad sense and in a narrow sense. I'm going to use it in a broad, inclusive sense and point out some gotchas along the way. Two drivers of feature engineering are (1) background knowledge from the domain of the task and (2) inspection of the data values. The first case includes a doctor's knowledge of important blood pressure thresholds or an accountant's knowledge of tax bracket levels. Another example is the use of body mass index (BMI) by medical providers and insurance companies. While it has limitations, BMI is quickly calculated from body weight and height and serves as a surrogate for a characteristic that is very hard to accurately measure: proportion of lean body mass. Inspecting the values of a feature means looking at a histogram of its distribution. For distribution-based feature engineering, we might see multimodal distributions--histograms with multiple humps--and decide to break the humps into bins. A major distinction we can make in feature engineering is when it occurs. Our primary question here is whether the feature engineering is performed inside the cross-validation loop or not.

engineering, feature engineering, intercept, (13 more...)

#artificialintelligence

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Automating artificial intelligence for medical decision-making

#artificialintelligenceSep-16-2019, 04:57:16 GMT

MIT CSAIL researchers are hoping to accelerate the use of artificial intelligence to improve medical decision-making, by automating a key step that's usually done by hand -- and that's becoming more laborious as certain datasets grow ever-larger. The field of predictive analytics holds increasing promise for helping clinicians diagnose and treat patients. Machine-learning models can be trained to find patterns in patient data to aid in sepsis care, design safer chemotherapy regimens, and predict a patient's risk of having breast cancer or dying in the ICU, to name just a few examples. Typically, training datasets consist of many sick and healthy subjects, but with relatively little data for each subject. Experts must then find just those aspects -- or "features" -- in the datasets that will be important for making predictions.

artificial intelligence, machine learning, medical decision-making, (11 more...)

#artificialintelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Automating artificial intelligence for medical decision-making

#artificialintelligenceSep-2-2019, 04:59:04 GMT

MIT computer scientists are hoping to accelerate the use of artificial intelligence to improve medical decision-making, by automating a key step that's usually done by hand -- and that's becoming more laborious as certain datasets grow ever-larger. The field of predictive analytics holds increasing promise for helping clinicians diagnose and treat patients. Machine-learning models can be trained to find patterns in patient data to aid in sepsis care, design safer chemotherapy regimens, and predict a patient's risk of having breast cancer or dying in the ICU, to name just a few examples. Typically, training datasets consist of many sick and healthy subjects, but with relatively little data for each subject. Experts must then find just those aspects -- or "features" -- in the datasets that will be important for making predictions.

artificial intelligence, machine learning, medical decision-making, (11 more...)

#artificialintelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Automating artificial intelligence for medical decision-making

#artificialintelligenceAug-7-2019, 02:53:43 GMT

MIT computer scientists are hoping to accelerate the use of artificial intelligence to improve medical decision-making, by automating a key step that's usually done by hand--and that's becoming more laborious as certain datasets grow ever-larger. The field of predictive analytics holds increasing promise for helping clinicians diagnose and treat patients. Machine-learning models can be trained to find patterns in patient data to aid in sepsis care, design safer chemotherapy regimens, and predict a patient's risk of having breast cancer or dying in the ICU, to name just a few examples. Typically, training datasets consist of many sick and healthy subjects, but with relatively little data for each subject. Experts must then find just those aspects--or "features"--in the datasets that will be important for making predictions.

artificial intelligence, machine learning, medical decision-making, (11 more...)

#artificialintelligence

Country:

North America > United States > Massachusetts (0.15)
North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.49)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Where Are We with Computer Vision? - insideBIGDATA

#artificialintelligenceJan-3-2018, 17:33:10 GMT

In the past several years, we've witnessed how deep learning, specifically convolutional neural networks, has been successfully applied to computer vision, natural language processing, speech recognition, logistics, online advertising, and many other problem domains. There are a few things that are unique about the application of deep learning to computer vision and understanding these characteristics will help in understanding the state of computer vision. In this article, I'd like to share a nice summary of the state of computer vision from Course 4 "Convolutional Neural Networks" from the new Deep Learning Specialization series on Coursera. Dr. Andrew Ng provides some compelling observations about deep learning and computer vision with the goal of mapping out the future of this increasingly popular technology. Consider that many machine learning problems fall somewhere on the spectrum between where you're working with "small data" to where you have "big data." For example, there is a decent amount of data available for speech recognition.

artificial intelligence, machine learning, manual feature engineering, (15 more...)

#artificialintelligence

Industry: Education (0.77)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Wide Learning: Experiments in Healthcare

Banerjee, Snehasis, Chattopadhyay, Tanushyam, Biswas, Swagata, Banerjee, Rohan, Choudhury, Anirban Dutta, Pal, Arpan, Garain, Utpal

arXiv.org Machine LearningDec-21-2016

In this paper, a Wide Learning architecture is proposed that attempts to automate the feature engineering portion of the machine learning (ML) pipeline. Feature engineering is widely considered as the most time consuming and expert knowledge demanding portion of any ML task. The proposed feature recommendation approach is tested on 3 healthcare datasets: a) PhysioNet Challenge 2016 dataset of phonocardiogram (PCG) signals, b) MIMIC II blood pressure classification dataset of photoplethysmogram (PPG) signals and c) an emotion classification dataset of PPG signals. While the proposed method beats the state of the art techniques for 2nd and 3rd dataset, it reaches 94.38% of the accuracy level of the winner of PhysioNet Challenge 2016. In all cases, the effort to reach a satisfactory performance was drastically less (a few days) than manual feature engineering.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Machine Learning

1612.0573

Country:

Asia > India (0.15)
Europe > Spain (0.14)

Genre: Research Report (0.85)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Therapeutic Area > Hematology (0.89)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.89)
Health & Medicine > Diagnostic Medicine > Imaging (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback